HIE from First Principles

Acronyms expanded in this post:

AI: Artificial Intelligence. software that generates, classifies, predicts, summarizes, or acts on patterns in data.
CDA: Clinical Document Architecture. an older Health Level Seven standard for structured clinical documents.
EHR: Electronic Health Record. the clinical system where patient care is documented and managed.
FHIR: Fast Healthcare Interoperability Resources. the modern web-friendly Health Level Seven healthcare data exchange standard.
HIE: Health Information Exchange. the sharing of clinical information across organizations.
HL7: Health Level Seven. the family of healthcare messaging and data exchange standards.
HL7 v2: Health Level Seven version 2. the older event-message standard still running much hospital integration.
ICD: International Classification of Diseases. a diagnosis classification system used for reporting, billing, and statistics.
IT: Information Technology. the practice of building, operating, and supporting computing systems.
MPI: Master Patient Index. identity logic used to decide whether records refer to the same person.
SNOMED CT: Systematized Nomenclature of Medicine Clinical Terms. a large clinical terminology for representing medical meaning.

Acronym crib sheet. Health Information Exchange [HIE] means the organized sharing of health data across separate hospitals, clinics, labs, pharmacies, public health systems, and other platforms. Open Health Information Exchange [OpenHIE] is an open architectural framework for building HIEs, especially in countries or regions where systems are fragmented, underfunded, and still expected to perform like Swiss watches. Electronic Health Record [EHR] means the clinical system used to document patient care. Health Level Seven version 2 [HL7 v2] is an older but still heavily used healthcare messaging standard for event-based system communication. Fast Healthcare Interoperability Resources [FHIR] is a modern web-based standard for representing and exchanging healthcare data in smaller modular chunks called resources. Clinical Document Architecture [CDA] is a document-based healthcare standard that carries both human-readable narrative and machine-readable structure. Master Patient Index [MPI] is a service that links records belonging to the same person across different systems. Client Registry [CR] is OpenHIE’s patient identity service. Facility Registry [FR] is the authoritative list of health facilities. Health Worker Registry [HWR] is the authoritative list of providers and health workers. Terminology Service [TS] manages code systems, value sets, mappings, and controlled meanings. Interoperability Layer [IOL] is the exchange layer that receives, validates, routes, transforms, secures, and logs communication between systems. Shared Health Record [SHR] is a normalized repository of selected patient-level clinical information used for cross-system sharing. Health Management Information System [HMIS] is a system used for aggregate health program reporting. Logistics Management Information System [LMIS] is a system used for supplies, stock, and inventory movement. International Classification of Diseases [ICD] is a disease classification system used for diagnosis coding and reporting. Systematized Nomenclature of Medicine Clinical Terms [SNOMED CT] is a detailed clinical terminology used to represent clinical concepts more precisely.

HIE is not a pipe; it is a peace treaty between quarrelsome little kingdoms of data.

That is the first thing to understand, before some vendor in a crisp shirt arrives with a dashboard and the confidence of a man who has never watched an interface fail at 2:13 in the morning. Healthcare data does not sit in one neat pot waiting to be ladled out. It is born in clinics, wards, labs, pharmacies, insurance desks, public health offices, mobile apps, district reports, and, may the gods forgive us, Excel sheets with names like final_new_latest_corrected_v3.xlsx. Each place thinks it is describing the same patient. It is not. It is describing the patient as seen through that workflow’s keyhole.

A doctor sees a person with breathlessness. A billing system sees a chargeable encounter. A laboratory system sees a specimen and a result. A public health program sees a case count. A pharmacy sees a dispensation. A government officer sees an indicator. A frightened family sees a crisis. The machine receives all this and says, in its small metallic voice, “please provide a unique identifier.”

There, in miniature, is the problem.

The ordinary mind thinks HIE means moving data from one system to another. That is the easy part. Data moves beautifully. Data is a born traveler. It can be wrapped in HL7 v2, packaged in CDA, exposed through FHIR, dumped into comma-separated files, pushed through an application programming interface, or carried by a tired clerk on a pen drive in a trouser pocket. Transport is not the miracle. Meaning is the miracle. And meaning, unlike data, does not travel second class quietly.

Transport asks, “Did the message arrive?” Meaning asks, “What did the message mean when it was created, who believed it, what workflow produced it, which code system named it, what time did it refer to, and can the receiver safely act on it?” These are different questions. Confusing them is how expensive systems become expensive furniture.

OpenHIE is useful because it starts with this grown-up suspicion. It does not pretend that one grand database will save the republic. It does not say, “Put everything in one EHR and clap.” It says, more sensibly, “Healthcare has different kinds of truth, each with different owners, different rhythms, and different failure modes. Separate them before they strangle one another.”

This is not glamorous. Nobody writes love songs about registries. No child in south Calcutta grows up saying, “One day I shall build a facility registry and make my mother proud.” Yet this is where the real architecture lives. In healthcare, the humble nouns are dangerous: patient, facility, provider, diagnosis, observation, encounter, referral, specimen, medication. If these nouns wobble, the whole system wobbles. The dashboard may still glow like a wedding pandal, but underneath it the bamboo is cracking.

The CR answers the most basic question: who is this person. This sounds absurdly simple until you enter the real world, where one woman may be registered as Rekha Das in one clinic, R. D. in another, Rekha Dey after marriage somewhere else, and “wife of Bimal” in a community health worker’s phone. Birth dates may be guessed. Addresses may move. Names may be spelled by whoever had the keyboard that day. In multilingual places, the same person’s name can acquire several costumes. The CR does not magically solve identity. It manages the uncertainty so one human being does not become five ghosts or, worse, five human beings do not become one administrative monster.

The FR answers another deceptively dull question: where did this happen. A facility is not just a name on a signboard. It belongs to an administrative hierarchy, has services, ownership, geography, operating status, catchment logic, reporting obligations, referral relationships, and sometimes a habit of changing names after every new political season. If the facility list is rotten, everything downstream smells. Disease rates by district, stock planning, workforce distribution, maternal health reporting, referral analysis, even payment logic—each begins to stagger like a man getting off a bus near Esplanade after three hours in traffic.

The HWR answers who did it, or who was allowed to do it, or who signed it, ordered it, verified it, referred it, dispensed it, supervised it, or pretended not to see it. Health systems love to talk about accountability. Accountability without provider identity is theatre with a spreadsheet.

The TS answers what the words and codes mean. This is the one people ignore until they are deep in trouble. A diagnosis code is not a diagnosis in the same way a railway ticket is not a journey. A local code for “high blood pressure” may map roughly to an ICD category, but the clinician may have meant uncontrolled hypertension, historical hypertension, pregnancy-related hypertension, or “patient once told me this while I was trying to finish outpatient clinic before lunch.” A blood pressure reading may be a fresh measurement, a copied note, a home value, a device feed, or a number typed because the form refused to close without it. The digits may travel perfectly. The meaning may arrive limping.

This is why many so-called data quality problems are not data quality problems at all. They are representation failures. The source system represented reality for one purpose. The receiving system quietly assumes it was represented for another. Then everyone scolds “bad data,” as if the data had character defects.

A child’s immunization record in a clinic system may be perfectly adequate for the nurse administering vaccines. But when a national dashboard asks whether the child belongs to a particular cohort, in a particular catchment, under a particular program definition, at a particular reporting date, the same record may suddenly look incomplete. The nurse did not fail. The original representation did not contain the later question. This is like blaming a fish for not being a bicycle, a common bureaucratic pastime.

The IOL sits in the middle and performs the unromantic but essential work of exchange. It authenticates. It authorizes. It receives messages. It validates them. It routes them. It may transform them. It logs success and failure. It retries. It complains. It is part traffic police, part customs officer, part interpreter, part night watchman.

But the IOL is not the truth. This point must be tattooed somewhere visible in every integration office. Middleware can enforce rules, but it cannot invent governance. An interface engine can route an immunization update to the right service. It cannot decide which national vaccine schedule is authoritative. It can transform a local facility code into a national facility identifier. It cannot decide who owns the facility list. It can reject a malformed message. It cannot know, by divine instinct, whether a clinician entered a diagnosis as suspected, confirmed, historical, billing-related, or merely convenient.

That decision belongs to architecture and governance, which are really the same animal seen under different lighting.

The SHR is where people often get overexcited. They imagine a national patient record, complete and shining, into which every fact flows, purified like Ganga water in an optimistic government advertisement. In reality, the SHR should contain selected, normalized, shareable clinical information useful across systems. It is not the entire EHR. It is not the analytics warehouse. It is not the legal record for every encounter. It is not a magic attic for every field anyone ever created.

This distinction matters. A shared record is built for continuity of care and cross-system retrieval. A warehouse is built for analysis, cohorts, reporting, trends, and denormalized querying. A transactional EHR is built for local clinical workflow. These are cousins, not triplets. When one system is forced to act like all three, it develops the personality of a badly run joint family: everyone lives in the same house, no one knows who owns the pressure cooker, and every decision requires shouting.

OpenHIE’s deeper architectural wisdom is that different healthcare facts have different half-lives. Patient demographics change but slowly. Facility hierarchies change by policy, politics, and administrative rearrangement. Provider roles change with employment, credentialing, transfer, and death by paperwork. Clinical events happen quickly and may later be corrected. Terminology changes by versioning, mapping, program requirements, and local usage. If you bind all of this into one giant model, you have not created elegance. You have created a concrete overcoat.

The better pattern is domain separation. Let patient identity be managed as patient identity. Let facility authority be managed as facility authority. Let provider identity and roles have their own governance. Let terminology be treated as a serious semantic asset, not a table called lookup_master maintained by a man named Debu who left the project in 2019. Let clinical exchange pass through a controlled IOL. Let the SHR carry what must be shared, not everything that ever happened.

This is where first principles rescue us from software shopping. Before asking “Which platform?” ask “Which truth?” Before asking “Which standard?” ask “Which workflow boundary?” Before asking “FHIR or HL7?” ask “What meaning must survive the crossing?”

FHIR is excellent when you need modern web-style exchange, granular resources, profiles, implementation guides, and cleaner application integration. HL7 v2 remains very useful for operational event messages, especially in hospitals where admission, discharge, transfer, orders, and results have been moving that way for decades. CDA still has value when the document itself matters, when narrative and structure must travel together, and when a clinical note must remain intelligible to a human being instead of being chopped into machine-friendly confetti.

The standard is not the architecture. It is a language. A language helps only when the speakers agree what they are talking about.

FHIR can represent a Patient resource beautifully while the real patient is duplicated three times. It can represent an Observation resource while nobody knows whether the measurement was sitting, standing, copied, device-generated, clinician-entered, or imported. It can represent a Condition resource while the receiving system cannot tell whether the condition is active, suspected, historical, ruled out, billing-selected, or carried forward from the Jurassic period of the chart. This is not an insult to FHIR. FHIR is a good tool. But a good envelope does not improve a foolish letter.

The same is true of HL7 v2. It has served healthcare for a long time, often in heroic, ugly, pipe-smoking fashion. It carries events well. It is everywhere because hospitals are historical creatures. But it often carries context in segments and local conventions that only the sending and receiving systems understand after years of mutual irritation. Break that local understanding, and the message becomes a telegram from an uncle with poor handwriting.

The real enemy is not old standards. The enemy is false confidence.

Identity matching is one source of false confidence. People say, “We will use deterministic and probabilistic matching,” and then look pleased, as if they have placed a mosquito net over the whole swamp. But identity matching is a continuous operational burden. False positives merge separate lives. False negatives split one life into pieces. Both are dangerous. In clinical care, the cost is not merely statistical. Wrong identity can mean wrong history, wrong allergy, wrong medication, wrong mother, wrong child. One does not need a PhD to understand the horror. One only needs imagination and a little professional fear.

Terminology mapping is another swamp. Mapping local concepts to national or international codes looks innocent in a spreadsheet. It is not. One local term may map to many standard concepts. Many local terms may map to one standard concept. Some terms are too vague. Some are workflow shortcuts. Some are political inventions. Some were created because the software had a dropdown and the clinic had a life. A TS helps, but only if it is governed. Otherwise it becomes another museum of approximations.

Time is the third villain. Healthcare data has too many clocks. There is when the event occurred, when it was documented, when it was corrected, when it was transmitted, when it was received, when it was posted, when it was queried, and when someone finally looked at the dashboard and panicked. If your architecture collapses all of these into one timestamp, it is not simplifying. It is committing a quiet little crime.

A lab specimen may be collected on Monday, received on Tuesday, resulted on Wednesday, corrected on Thursday, and viewed by a clinician on Friday. Which date matters? The answer depends on the question. For clinical action, result time may matter. For turnaround time, collection and result times matter. For surveillance, reporting time may matter. For audit, correction time matters. Architecture must preserve this plurality instead of ironing it flat.

The practical point is plain: build provenance from the beginning. Every important exchanged fact should know where it came from, who asserted it, when it was created, whether it was transformed, which terminology version applied, and whether it replaced an earlier fact. Provenance is not decorative metadata. It is the receipt you need when the system is later accused of lying.

There is also the old question of early-binding and late-binding transformation. Early-binding means you force incoming data into your canonical model as soon as it arrives. This makes life look cleaner. Dashboards become easier. Downstream systems clap politely. But you may lose context forever. Late-binding means you preserve more source detail and interpret later depending on use. This is richer but harder. It demands better storage, better metadata, better analysts, and more honest governance.

In real life you need both. Bind early where the meaning is stable and the use is clear. Bind late where context matters and future questions are uncertain. Do not bind early merely because governance is weak and everyone is tired. That is how architecture becomes a cupboard full of old compromises.

OpenHIE gives a practical way to organize the compromise. It does not promise heavenly cleanliness. It gives you compartments where different messes can be handled by the right authority. Patient identity in the CR. Facility authority in the FR. Worker identity in the HWR. Controlled meanings in the TS. Clinical sharing through the SHR. Movement and orchestration through the IOL. Point-of-service systems continue doing their local work. The exchange does not abolish them. It disciplines the crossing.

This is especially relevant in places like India, though the principle applies anywhere. We are very fond of giant declarations. National digital this, universal platform that, one health stack, one identifier, one dashboard, one more splendid abstraction with a logo. But healthcare on the ground is not a logo. It is an old man in a crowded outpatient department, a nurse juggling paper and software, a lab technician entering results after power fluctuations, a private clinic guarding its data, a government program chasing indicators, a patient moving between public and private care with no single thread tying the story together. Architecture that ignores this is not ambitious. It is illiterate.

The realistic constraint is that no clean solution exists. Legacy systems will remain. Private providers will not happily surrender data without incentives and protections. Public systems will have uneven infrastructure. Governance will be slower than software. Programs will defend their turf. Standards will be implemented partially. Funding will arrive in bursts. People will keep spreadsheets because spreadsheets are the cockroaches of information civilization: ugly, resilient, and always there after the bomb.

So the answer is not purity. The answer is disciplined imperfection.

Start with the registries that stabilize the nouns. Build or strengthen the CR if patient identity is breaking care continuity. Establish the FR before pretending district-level reporting is accurate. Create the HWR where provider roles and accountability matter. Stand up a TS before local code chaos becomes national analytics. Use the IOL to control exchange, not to hide every unresolved governance question in scripts. Use the SHR for selected shareable clinical facts. Build analytics separately, with its own model, latency expectations, and reproducibility rules.

And above all, stop calling everything “data quality.” It is too lazy. Sometimes the data is wrong. Often the representation is wrong for the new purpose. A value captured for bedside care is not automatically fit for public health reporting. A billing diagnosis is not automatically a clinical truth. A local facility name is not automatically a national identifier. A transported message is not automatically shared meaning. These distinctions are not academic embroidery. They decide whether the system works.

The first principle of HIE, then, is almost embarrassingly human: before systems can share data, institutions must agree what is being shared, who is allowed to say it, what it means, how long it remains valid, and what should happen when it is wrong.

OpenHIE does not remove the difficulty. It gives the difficulty proper rooms to live in. That alone is a large mercy. In healthcare IT, as in old Calcutta houses, the problem is rarely that things exist. The problem is that everything has been piled into one room for twenty years, the wiring is hidden behind damp plaster, the labels have fallen off, and someone is now asking why the fan switch turns on the bathroom light.

Related Posts